Research Question
How does the the proficiency level in Reading and Language skill vary across state for High School Students? How does it differ by students characteristics such as English Learner status, Student with Disability, Low Income students?
Data Source: State Assessments in Reading/Language Arts and Mathematics- School Year 2018-19 EDFacts Data Documentation, U.S. Department of Education, Washington, DC: EDFacts. Retrieved February 2, 2022 from http://www.ed.gov/edfacts
I decided to use map graph to visualize the achievement gap of students with disability, EL students, and Low-income students compared to their peers. This way, audience can compare each state math achievement across all categories as well as able to perceive the intensity of gap with heatmap color for easier interpretation.
The goal of these bar plots is to show achievement gap of students with disability, EL students, and Low-income students compared to their peers. In the previos set of plots, the bar was presented on its own. I decided to add the all students average in grey as the background for easier comparison. However, I want people to be able to compare the difference all at once in one frame.
Research Question
How does the the proficiency level in math vary across state for High School Students? Also, how does it vary across the US regions (i.e Midwest, Northeast, South, and West)?
Data Source: State Assessments in Reading/Language Arts and Mathematics- School Year 2018-19 EDFacts Data Documentation, U.S. Department of Education, Washington, DC: EDFacts. Retrieved February 2, 2022 from http://www.ed.gov/edfacts
The south region has relatively high percentage of students at and above Math proficiency, in contrast, the West region has relatively low percentage of students at and above Math proficiency. There is no clear pattern for Midwest and Northeast regions as there is large variation of Math profiency within these regions.
Another way to display the variation of Math proficiency skill across states is by showing how far the state mean diverge above and below the national mean. I like the simplicity of this map instead of the previous one and chose red color to represent below the average and green color to represent above the average. I think this map is easier to understand for general audience.
This is the initial diverging plot that includes anotated state mean difference relative to national average. However, the number is too small and the information has already displayed through the geom_segment() line. I decided to use the other diverging bar plot.
Research Question
How does the the proficiency level in Reading and Language skill vary across state for High School Students? How does it differ by students characteristics such as English Learner status, Student with Disability, Low Income students?
Data Source: Distric Fiscal documentation 2018 School year 2019 Retrieved February 2, 2022.
On Average, Indiana spent the most on textbook per student at $107, while Hawaii spent the least on textbook per student at $14.
The average spending per students in district level is aggregated at the state level. Since we will calculate the textbook spending and student achievement only in state level, I decided to have to state-level map as the final version.
Initially, we wanted to present overall district spending per student in one large US map. However, we had two problems with the plan. 1. The amount of time to render the plot is too long and 2. Too many information in one map and it might distract our audience from our main goal. We decided to have each state district spending and play around with ggplotly function where audience can zoom in and out easily.
Color-blind OR Plot
This colorblind friendly version of the Oregon graph really stands out with the vibrant yellow school distric showcased as one of the highest textbook spenders per student.
Research Question
What is the relationship between District Textbook Funding per Student and Students’ Math and RLA achievements?
Data Source: State Assessments in Reading/Language Arts and Mathematics- School Year 2018-19 EDFacts Data Documentation, U.S. Department of Education, Washington, DC: EDFacts. Retrieved February 2, 2022 from http://www.ed.gov/edfacts
States with relatively low spending on textbook have lower percentage students at and above Math proficiency level.
This is my final plot where I added a linear line and annotated the correlation between Textbook spending and Math achievement. I highlighted the 3 top and bottom states based on their Math achievement and colored the rest of the states with gray and a degree of transparency. This plot is simple and highlighted the most important information.
Borrowing the existing template from another github, I tried to visualize the relationship between textbook spending and RLA achievement. I added the size function on geom_point to display amount of textbook spending. Eventhough the plot is pretty, the size of the point is a hard to discriminate.
This is the initial plot when I tried to visualize the relationship between achievemen and textbook spending. Although all information is there, this map is too crowded with information of state name.
I like the simplicity of this plot and the national average dashed line, however, the geom_point size displaying textbook spending per student is hard to differenciate and overlapped with each other.
This plot is clean and interesting to see because I could highlight selected state based on upper and lower criteria I specified in my code. However, I only want to use two different colors to distinguish the 3 top and lower states. A variety of color on the circle and state names are a little bit too distracting.
Variables:
LEAID - ID number for each school district
total - Total number of students in a school district
textbk_exp_per_student - Textbook Expenditure per Student
percentM - Percent of student population in a school district identifying as minorities
percentW - Percent of student population in a school district identifying as white
Data Source: https://usafacts.org/
Future Analyses: A plot showing how textbook expenditures have changed over time in school districts and states is being planned.
| Total Number of Students per District | Textbook Expenditure per Student | Percent of Student Population Identifying as Minority | Percent of Student Population Identifying as White | |
|---|---|---|---|---|
| total | ||||
| textbk_exp_per_student | 0.00 | |||
| percentM | 0.21*** | 0.05*** | ||
| percentW | -0.21*** | -0.05*** | -1.00*** |
This table is a simple way to express the correlation statistics for the different variable looked at in response to Research Question 3. It is easy to print and reproduce for a general audience.
This is the basic version of a scatterplot that was used to determine if any patterns could be found in the data. While no patterns were found, I decided to pursue some nicer versions of the plot for use with an online audience.
I chose a blue green scale for the points and tried to highlight them with a light yellow background to the other colors would pop visually.
As you can see from both plots (p1 & p2) there was no relationship between the variables as evidenced by a lack of pattern. The points are mainly clustered in the $0 to $500 dollar range regardless of the school district’s percent of minority or white students. 2 outliers show school districts with total textbook expenditure per student over $3,000 dollars. Since I was unable to tell from the data if these might be private or specialized schools that are actually spending that much on textbooks, I chose to not remove these outliers from the data set being analyzed. Looking back, I wonder if I should have chosen a different color scheme for the p2 plot so that it would be easier for viewers to realize different variables are being analyzed.
This graph seemed to be an interesting set up at first, but it is very hard to distinguish the slight size changes in the ovals from the circles - so viewers might not notice the shapes have meaning. The ovals display the type of correlation direction with the way they lean, while circles are pairs of variables closer to 0 correlation. There are no significance asterisks included for the correlations, which a more specialized audience would expect to see.
This initial version seems to busy with the large vs. small font sizes and red lines. The red lines might also make viewers think of a negative correlation when in fact there is not a strong or meaningful one between the variables being analyzed.
This second correlation matrix is missing the lines from the scatterplots, but it is cleaner visually with some overlap in x-axis labels. The tabs on the borders stand out nicely though.
The color is more easily changed via the pairs.panels function. The scatterplots now have visible lines and the color was customized to make certain aspects stand out. The font size of the correlation coefficients is now the same. The variable labels have returned to the diagonal so it might be nice to figure out a way to customize those to stand out more.